Getting started with R

From the very basics

Dr. Carina Nigg & dr. Judith Bouman

Introduction

Disclaimer

This is an introduction course and we expect no prior knowledge

Adjust your speed to your own level

Ask questions at any time

Tell us if you are bored or overwhelmed

Course content

Introduction of R and RStudio

  • Understanding R and RStudio
  • Using basic functions
  • Writing a first script
  • Understanding packages
  • Importing data
  • Using basic functions on imported data

Analyzing your own data

  • Organizing your data
  • Loading your data into R
  • Providing an overview of your data
  • Inspecting missing data
  • Checking plausibility of your data
<<<<<<< HEAD

The program

Time Topic
27th 9.00 - 9.30 R and RStudio, why?
27th 9.00 - 9.30
27th 9.00 - 9.30

Final exercise

  • Choose your own dataset
  • Organize your data folder
  • Read data into R environment using an r script
  • Select 4 variables (at least one numerical and one categorical)
  • Find out what classes your variables have and correct them to the most sensible class if necessary
  • Check for missing values for all variables
  • Give summary statistics
  • For vector/categorical variables give table with count and percentages
  • For numerical variables give a table with mean/median, sd, min, max
  • Visualize
  • For numerical: histogram/boxplot
  • For categorical: bar

Grading: Pass or fail (0.5 ECT)

======= >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

Why learn R?

  • Reproducibility of your results
  • <<<<<<< HEAD
  • Free software (unlike stata or SPSS)
  • =======
  • Free software
  • >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
  • A lot of resources available
  • Potentially useful in further career
<<<<<<< HEAD

Installing R and RStudio

Did you all manage to install R and RStudio?

=======

Disclaimers

  • First time we are teaching this course
  • We expect no prior knowledge
  • Ask questions
  • Tell us if we are too slow/fast

Installing R and RStudio

Did you all manage to install both R and RStudio?

>>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

Difference R and RStudio

Opening RStudio

<<<<<<< HEAD

Basic concepts in R

=======

Rstudio windows

>>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

Simple calculator

1 + 1 
[1] 2

Using a script

  • Track code
  • <<<<<<< HEAD
  • Adjust code
  • Repeat code

Now, we can download, save and open the “follow_along_script_monday_morning.R”

Saving “objects”

a = 1 
b = 2
c = a + b

# can use = or "<-"

c <- a + b 

c
=======
  • Make changes
  • Repeat code (reproducible science)
  • Now, you can download, save and open the “follow_along_script_monday_morning.R”

    Code and comments

    “#” can be use to add text and comments

    # Use "#" to add text and explanations to your code 

    Keyboard shortcuts

    • Run the current line or selection: Ctrl + Enter / Option + Enter
    • Run all code in the script: Ctrl + Alt + R / Command + shift + Enter
    • Interrupt running code: Esc
    • Comment/uncomment lines: Ctrl + Shift + C / Command + shift + C

    Saving “objects”

    a = 1 
    b = 2
    c = a + b
    
    # can use = or "<-"
    
    c <- a + b 
    
    c
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] 3
    <<<<<<< HEAD =======

    Why could this be helpful?

    Example “objects”

    data = c(1,13,2,1,43,53,1,2,34,54,2,4,6,23)
    
    cutoff = 10 
    
    data[data>cutoff]
    [1] 13 43 53 34 54 23

    Using “functions”

    Using “functions”

    sum(1, 2)
    [1] 3
    sum(a , b)
    [1] 3
    ?sum

    Getting help with functions

    ?sum
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

    Vector

    <<<<<<< HEAD
    c(1,2,3)
    [1] 1 2 3
    c(a, b)
    =======
    c(1,2,3)
    [1] 1 2 3
    c(a, b)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] 1 2

    Vector

    How to access an element in the vector

    <<<<<<< HEAD
    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)
    
    a_vector[4]
    =======
    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)
    
    a_vector[4]
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] 3.11

    Array/matrix

    <<<<<<< HEAD
    matrix(data = c(1,2,3,4,5,6,7,8), nrow = 4 )
    =======
    matrix(data = c(1,2,3,4,5,6,7,8), nrow = 4 )
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
         [,1] [,2]
    [1,]    1    5
    [2,]    2    6
    [3,]    3    7
    [4,]    4    8

    Array/matrix

    How to access an element from a matrix

    <<<<<<< HEAD
    a_matrix = matrix(data = c(1,2,3,4,5,6,7,8), nrow = 4 )
    
    a_matrix[3,2] # first row, then column
    =======
    a_matrix = matrix(data = c(1,2,3,4,5,6,7,8), nrow = 4 )
    
    a_matrix[3,2] # first row, then column
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] 7
    <<<<<<< HEAD

    Using “functions”

    sum(1, 2)
    [1] 3
    sum(a , b)
    [1] 3
    ?sum

    Getting help with functions

    ?sum

    Let`s try!

    =======

    Let`s try! – Exercise 1

    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

    Can you calculate the following for “a_vector”?

    • Mean
    • Standard deviation
    • Maximal value
    • Minimal value
    • Length of the vector
    <<<<<<< HEAD
    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)

    Let`s try! - Solution

    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)
    
    mean(a_vector)
    [1] 4.02425
    sd(a_vector)
    [1] 1.811263
    max(a_vector)
    [1] 6.21
    min(a_vector)
    [1] 1.23
    length(a_vector)
    =======
    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)

    Take 15 minutes to try

    Let`s try! – Exercise 1 - Solution

    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)
    
    mean(a_vector)
    [1] 4.02425
    sd(a_vector)
    [1] 1.811263
    max(a_vector)
    [1] 6.21
    min(a_vector)
    [1] 1.23
    length(a_vector)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] 8
    <<<<<<< HEAD

    Take 15 minutes to try

    =======

    Let`s try! – Exercise 2

    Can you figure out what you can do with the following functions?

    • seq()
    • rep()

    Take 10 minutes

    Let`s try! – Exercise 2 - Solution

    seq(1,10)
     [1]  1  2  3  4  5  6  7  8  9 10
    seq(1,10, by = 2)
    [1] 1 3 5 7 9
    rep(0,100)
      [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

    Classes and types

    Numeric and character

    <<<<<<< HEAD
    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)
    
    class(a_vector)
    [1] "numeric"
    b_vector = c("something", "something else", "another thing", "completely differnt")
    
    class(b_vector)
    =======
    a_vector = c(1.23, 2.34, 6.21, 3.11, 3.412, 4.32, 5.922, 5.65)
    
    class(a_vector)
    [1] "numeric"
    b_vector = c("something", "something else", "another thing", "completely differnt")
    
    class(b_vector)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] "character"

    Classes and types

    Logical

    <<<<<<< HEAD
    c_vector = c(F, T, T,T, F, T, F, T)
    
    class(c_vector)
    =======
    c_vector = c(F, T, T,T, F, T, F, T)
    
    class(c_vector)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    [1] "logical"
    <<<<<<< HEAD

    Dataframe

    # Create a data frame
    df <- data.frame(x = 1:3, y = c("a", "b", "c"))
    
    # Printing
    print(df)
    =======

    Classes and types

    factor

    gender <- factor(c("Male", "Female", "Female", "Male"))
    
    class(gender)
    [1] "factor"

    Combining different types of vectors

    Lists

    my_list <- list(name = "Alice", age = 25, scores = c(90, 85, 88))
    
    print(my_list)
    $name
    [1] "Alice"
    
    $age
    [1] 25
    
    $scores
    [1] 90 85 88

    Why do we care about classes and types?

    Helpful for plotting –> we come back to this later

    Dataframe

    # Create a data frame
    df <- data.frame(x = 1:3, y = c("a", "b", "c"))
    
    # Printing
    print(df)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
      x y
    1 1 a
    2 2 b
    3 3 c

    Tibble

    <<<<<<< HEAD
    # Create a tibble
    library(tibble)
    tb <- tibble(x = 1:3, y = c("a", "b", "c"))
    
    print(tb)
    =======
    # Create a tibble
    library(tibble)
    tb <- tibble(x = 1:3, y = c("a", "b", "c"))
    
    print(tb)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0
    # A tibble: 3 × 2
          x y    
      <int> <chr>
    1     1 a    
    2     2 b    
    3     3 c    

    Packages

    Get access to specific set of functions

    <<<<<<< HEAD
    #install.packages("tidyverse")
    library(tidyverse)
    =======
    #install.packages("tidyverse")
    library(tidyverse)
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

    run “library()” every time you want to use any function from this package

    Let’s try!

    Get your data in R

    Organize your data

    project_dir

    Organize your data

    Choose your working directory

    <<<<<<< HEAD
    getwd() 
    [1] "/Users/jb22m516/Documents/GitHub/getting_started_with_R/slides"
    setwd("/Users/jb22m516/Documents/GitHub/getting_started_with_R/")
    =======
    getwd() 
    [1] "/Users/jb22m516/Documents/GitHub/getting_started_with_R/slides"
    setwd("/Users/jb22m516/Documents/GitHub/getting_started_with_R/")
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

    “Path” to file in R

    A “path” tells R where it can find your data

    Functions for loading data

    <<<<<<< HEAD
    #install.packages("utils")
    library(utils)
    
    #data = read.csv2()
    =======
    #install.packages("utils")
    library(utils)
    
    #data = read.csv2()
    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0

    Types of data in R

    tibble / dataframe / matrix / array

    <<<<<<< HEAD =======

    Introduction to simple plots

    >>>>>>> 25f258804843b2b46aaa7fc92a31b20d7a7a95b0